A Statistical Model for Hangeul-Hanja Conversion in Terminology Domain

نویسندگان

Jin-Xia Huang

Sun-Mee Bae

Key-Sun Choi

چکیده

Sino-Korean words, which are historically borrowed from Chinese language, could be represented with both Hanja (Chinese characters) and Hangeul (Korean characters) writings. Previous Korean Input Method Editors (IMEs) provide only a simple dictionary-based approach for Hangeul-Hanja conversion. This paper presents a sentencebased statistical model for Hangeul-Hanja conversion, with word tokenization included as a hidden process. As a result, we reach 91.4% of character accuracy and 81.4% of word accuracy in terminology domain, when only very limited Hanja data is available.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Missionary contributions toward the revaluation of Hangeul in late nineteenth-century Korea

Soon after their arrival to Korea, Christian missionaries were confronted by decisions regarding how they would present written materials to the Korean people. While many Koreans used their indigenous script (Hangeul) for everyday purposes, higher status literacy materials were expected to be presented using Chinese characters (Hanja), a system unfamiliar to most but considered more prestigious...

متن کامل

Using Context-based Statistical Models to Promote the Quality of Voice Conversion Systems

This article aims to examine methods of optimizing GMM-based voice conversion systems performance in which GMM method is introduced as the basic method for improvement of voice conversion systems performance. In the current methods, due to using a single conversion function to convert all speech units and subsequent spectral smoothing arising from statistical averaging, we will observe quality ...

متن کامل

Sampling Rate Conversion in the Discrete Linear Canonical Transform Domain

Sampling rate conversion (SRC) is one of important issues in modern sampling theory. It can be realized by up-sampling, ﬁltering, and down-sampling operations, which need large complexity. Although some efficient algorithms have been presented to do the sampling rate conversion, they all need to compute the N-point original signal to obtain the up-sampling or the down-sampling signal in the tim...

متن کامل

Unicode Canonical Decomposition for Hangeul Syllables in Regular Expression

Owing to the high expressiveness of regular expression, it is frequently used in searching and manipulation of text based data. Regular expression is highly applicable in processing Latin alphabet based text, but the same cannot be said for Hangeul∗, the writing system for Korean language. Although Hangeul possesses alphabetic features within the script, expressiveness of regular expression pat...

متن کامل

Dissociative Disturbance in Hangul-Hanja Reading after a Left Posterior Occipital Lesion

Since the Korean language has two distinct writing systems, phonogram (Hangul) and ideogram (Hanja: Chinese characters), alexia can present with dissociative disturbances in reading between the two systems. A 74-year-old right-handed man presented with a prominent reading impairment in Hangul with agraphia of both Hangul and Hanja after a left posterior occipital- parietal lesion. He could not ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

A Statistical Model for Hangeul-Hanja Conversion in Terminology Domain

نویسندگان

چکیده

منابع مشابه

Missionary contributions toward the revaluation of Hangeul in late nineteenth-century Korea

Using Context-based Statistical Models to Promote the Quality of Voice Conversion Systems

Sampling Rate Conversion in the Discrete Linear Canonical Transform Domain

Unicode Canonical Decomposition for Hangeul Syllables in Regular Expression

Dissociative Disturbance in Hangul-Hanja Reading after a Left Posterior Occipital Lesion

عنوان ژورنال:

اشتراک گذاری